The indexsheet DTD defines the format to which indexsheets must conform.
<!-- DTD for XIL Based on XSLT Copyright (c) 1998-2023, Rocket Software, Inc. --> <!-- definitions --> <!ELEMENT np:definitions (field|facet|facet-name-rules)*> <!ELEMENT field EMPTY> <!ATTLIST field name CDATA #REQUIRED type (text|long|double|date|time|datetime) "text" relevance (normal|high|higher|highest) "normal" picture CDATA #IMPLIED index (yes|no) "yes" exclusive (yes|no) "no" term-list (yes|no) "no" phrase (yes|no) "no" toc-section (yes|no) "no" stop-words (yes|no) "no" proximity (yes|no) "yes" date-2000 (yes|no) "no"> <!ELEMENT facet EMPTY> <!ATTLIST facet name CDATA #REQUIRED query CDATA #REQUIRED> <!ELEMENT facet-name-rules (rule*)> <!ELEMENT rule (rule*)> <!ATTLIST rule match CDATA #IMPLIED find CDATA #IMPLIED replace CDATA #IMPLIED stop (yes|no) "no"> <!-- XIL specific part --> <!ELEMENT np:index-attribute EMPTY> <!ATTLIST np:index-attribute name CDATA #REQUIRED field CDATA #IMPLIED field-name-attribute CDATA #IMPLIED field-element-name (yes|no) 'no' facet CDATA #IMPLIED facet-name-attribute CDATA #IMPLIED facet-element-name (yes|no) 'no'> <!ELEMENT np:index (np:index-attribute*, xsl:apply-templates)> <!ATTLIST np:index field CDATA #IMPLIED field-name-attribute CDATA #IMPLIED field-element-name (yes|no) 'no' title-field CDATA #IMPLIED facet CDATA #IMPLIED facet-name-attribute CDATA #IMPLIED facet-element-name (yes|no) 'no' toc-heading (yes|no|HTML|title-HTML) 'no' toc-section (yes|no) 'no' break-word (yes|no) 'no' proximity (paragraph|sentence) #IMPLIED hidden (yes|no) 'no' remove (yes|no) 'no' index (yes|no) 'yes' hit-anchor (yes|no|postpone) 'yes' hit-hilite (yes|no) 'yes' hit-total (yes|no) 'no' relevance (normal|high|higher|highest) 'normal'> <!ATTLIST xsl:stylesheet case-sensitive (yes|no) 'yes' xmlns:xsl CDATA #IMPLIED xmlns:np CDATA #IMPLIED extension-element-prefixes CDATA #IMPLIED> <!ELEMENT np:preprocess (xsl:template+)> <!ATTLIST np:preprocess command CDATA #REQUIRED content-type CDATA #IMPLIED encoding CDATA #IMPLIED indexsheet CDATA #IMPLIED> <!ELEMENT np:property EMPTY> <!ATTLIST np:property name CDATA #REQUIRED field CDATA #IMPLIED toc-heading (yes|no) 'no'> <!-- subset of XSLT --> <!ELEMENT xsl:stylesheet (np:definitions?, xsl:template*, np:preprocess?, np:property*)> <!-- Used for attribute values that are patterns.--> <!ENTITY % pattern "CDATA"> <!ELEMENT xsl:template (np:index?, np:index-attribute*)> <!ATTLIST xsl:template match %pattern; #REQUIRED> <!ELEMENT xsl:apply-templates EMPTY>
The following elements and attributes are used in indexsheets:
xsl:stylesheet | Defines an indexsheet made up of indexing rules. |
xsl:template | Specifies an indexing rule made up of a pattern, a priority, and an action. |
pattern | Specifies a pattern to match. |
np:index | Specifies an indexing action to perform. |
np:index-attribute | Specifies an indexing action to perform on meta data. |
np:definitions | Encloses a set of field, facet and facet-name-rules elements for defining names fields and names of facets and their properties. |
facet | Declares the query based facet that generates one facet value that includes multiple documents. |
facet-name-rules | Root element that includes the rules for transformation of facet names and values. |
field | Defines a named field that can be applied to a portion of a document using the np:index element. |
rule | Defines the rule for name transformation. |
Defines an indexsheet made up of indexing rules. An indexsheet must be defined
inside an indexsheet
element when included in a content collection makefile.
xsl:stylesheet
is the root element for standalone xil indexsheets.
<!ELEMENT xsl:stylesheet (np:definitions?, xsl:template*, np:preprocess?, np:property*)> <ATTLIST xsl:stylesheet case-sensitive (yes|no) 'yes' xmlns:xsl CDATA #IMPLIED xmlns:np CDATA #IMPLIED extension-element-prefixes CDATA #IMPLIED>
Attribute | Description |
---|---|
case-sensitive | When set to yes, element names are case sensitive. Setting this attribute to no is recommended when element names do not have consistent case (often seen in HTML documents). The default is yes. |
Use the xsl:template
element to define
indexing rules for an indexsheet.
<xsl:stylesheet case-sensitive='no'> <xsl:template match='Creator'> <np:index field="Creator"> <xsl:process-children/> </np:index> </xsl:template> <xsl:template match='ACT/TITLE'> <np:index field="act title"> <xsl:process-children/> </np:index> </xsl:template> <xsl:template match='SCENE/TITLE'> <np:indexnp:index field="scene title"> <xsl:process-children/> </np:index> </xsl:template> <xsl:template match='RDF/Description/Format'> <np:index break-word=yes field="Format"> <xsl:process-children/> </np:index> </xsl:template> <xsl:template match='meta[attribute(name)="keywords"]'> <np:index-attribute name="content" field="keywords"/> </xsl:template> <xsl:template match='meta'> <np:index-attribute name="content" field-name-attribute="name"/> </xsl:template> </xsl:stylesheet>
Specifies an indexing rule made up of a pattern and an action. Indexing rules
must be defined inside an xsl:stylesheet
element.
Attribute | Description |
---|---|
match | The pattern to match against the source node or nodes to which the rule applies. |
An indexing rule is made up of a pattern-action pair. The pattern specifies an
element to match. The action specifies the action to perform on matched
elements. Index actions are defined using the np:index
element.
See xsl:stylesheet
.
Specifies a string which is matched against an element in a source document.
<!ENTITY % pattern "CDATA">
The most common pattern specifies the element type name of a matching element.
For example, the pattern emph
matches an element whose type is emph
.
More complex patterns specify the element types of ancestors of a matching
element. For example, the pattern olist/item
matches an element
with an item
type and a parent element type of olist
.
These are some additional examples:
Type of Match | Example Pattern | Description |
---|---|---|
Element | TITLE | Matches any TITLE element. |
Element with parents | ACT/TITLE | Matches a TITLE whose direct parent is an ACT element. |
Element with Ancestors | ACT//TITLE | Matches a TITLE with an ancestor that is an ACT element. |
Multiple parents | ACT/SCENE/TITLE | |
Multiple ancestors | ACT//SCENE//TITLE |
In addition to matching elements based on hierarchy, you may match elements
based on their attributes. Any element, parent, or ancestor can have
attributes. Exceptions to this rule are the root and wildcard elements
described below which may not have attributes. The syntax method for specifying
a pattern is @name
.
These are some example patterns.
Type of Match | Example Pattern | Description |
---|---|---|
Element | TITLE | Matches any TITLE element. |
Element with attribute | TITLE[@name] | TITLE element with a value specified for the name attribute. |
Element with attribute | 'TITLE[@name="Bob"]' | TITLE element with "Bob" specified for the name attribute. |
Element with attributes | "TITLE[@name, attribute(id)]" | TITLE element with a value specified for the name attribute and for the id attribute. Separate attributes using commas. |
Note that attribute values may be in either double or single quotes; however,
the whole pattern must use the opposite quote. Both ("TITLE[attribute(name)='Bob']")
and ('TITLE[attribute(name)="Bob"]')
are valid.
Type of Match | Example Pattern | Description |
---|---|---|
Root | "/" | Matches the root element. |
Root as parent | "/ACT/TITLE" | The ACT element's parent is the root (actually no parent). |
The * pattern is a wildcard that matches a single element of any type. When used
within an ancestry chain, the wildcard matches exactly one level of hierarchy."
Only a standalone '*' is allowed for each element. For example"T*"
would not be valid. The "*" pattern is not allowed as the target element.
Type of Match | Example Pattern | Description |
---|---|---|
Wildcard | ACT/*/TITLE | Matches a TITLE whose parent is any element which has as its parent an ACT element. |
Any combination of patterns can be combined together with the '|' symbol which represents the OR Boolean operator. This is for shorthand (does not add functionality) because it is equivalent to writing two separate templates, each with one of the patterns and the rest being the same.
Type of Match | Example Pattern | Description |
---|---|---|
Orred | "ACT/TITLE|SCENE/TITLE" | Matches ACT/TITLE elements and SCENE/TITLE elements. |
No extra white space is allowed in patterns (except in element values) even though XSL allows it.
See xsl:stylesheet
.
Specifies an indexing action.
<!ELEMENT np:index (np:index-attribute*, xsl:apply-templates) > <!ATTLIST np:index field CDATA #IMPLIED field-name-attribute CDATA #IMPLIED field-element-name (yes|no) 'no' title-field CDATA #IMPLIED facet CDATA #IMPLIED facet-name-attribute CDATA #IMPLIED facet-element-name (yes|no) 'no' toc-heading (yes|no|HTML|title-HTML) 'no' toc-section (yes|no) 'no' break-word (yes|no) 'no' proximity (paragraph|sentence) #IMPLIED hidden (yes|no) 'no' remove (yes|no) 'no' index (yes|no) 'yes' hit-anchor (yes|no|postpone) 'yes' hit-hilite (yes|no) 'yes' hit-total (yes|no) 'no' relevance (normal|high|higher|highest) 'normal' >
The following attributes are applied to all elements, including any children such as text:
Attribute | Description |
---|---|
field | Name of the field to apply to matched elements. Fields can be
defined within the indexsheet or the content collection makefile. See
np:definitions for information on defining fields within
an indexsheet. |
field-name-attribute | Use the attribute value as the name of the field to apply. The field name is specified in the source document rather than in the indexsheet. You can add a field attribute to an element in source data or use an existing attribute such as the class attribute common to SPAN and DIV elements. Without the field-name-attribute you would need to write a separate rule for each unique field applied. |
field-element-name | This is the same as field-name-attribute except field-lement-name is the selected element name. The default XML indexsheet uses field-element-name to name fields based on element names. |
facet | Close analog of field attribute of np:index element. Defines the name of facet that is created by the xsl:template match . Defines the name of the field such as field attribute. Allows to enumerate several elements that are separated by commas. |
facet-name-attribute | Close analog of field-name-attribute attribute. Defines the attribute value that becomes the facet name. |
facet-element-name | Close analog of field-element-name attribute. The yes value of the attribute states that element name is going to be the name of facet. |
title-field | If you use the title-HTML option of toc-heading (see below). then the first element found in document with matching index rule marked with title-HTML option is used as the title and the title-field is applied to that element. |
toc-heading | Mark matched elements as table of contents headings. You can
specify yes, no,
HTML, or title-HTML.
The default is no.
Specifying HTML generates table of contents headings based on H1...H6. HTML works only for H1...H6 because it must know the hierarchical order of tags to generate the TOC, whereas title-HTML supports the special case when you want the document title to be the first instance of any H1...H6 heading encountered and the remaining H1...H6 headings to generate the TOC structure as if you specified HTML. |
toc-section | Identifies structural elements that belong in the table of contents
(TOC). Specify yes or no.
The default is no.
For example a subsection element in an XML document is marked with toc-section and a child element that represents the heading for the subsection is marked with toc-heading. Together, the subsection element and the heading element define an entry in the table of contents. The toc-section attribute is not used when toc-heading is set to HTML or title-HTML because in HTML, H1-H6 only identify headings and do not mark structural elements. |
proximity | Make the begin tag of a matched element specify the proximity value for searches. You can specify paragraph or sentence. |
hidden | Specifies whether or not to hide text within the field element. You
can specify yes or no.
The default is no. Hidden text is indexed, but not displayed. You could use hidden text to index descriptions of graphics. Users could search graphics based on their descriptions, while the results show only the graphics without their descriptions. |
index | Specifies whether or not to index text within the field element. You can specify yes or no. The default is yes. |
hit-anchor | Specifies where to place a hit-anchor within the element. You can specify yes, no, or postpone. The default is yes which allows a hit-anchor code to be placed within the element. The no value ignores the hit-anchor code and postpone postpones the hit-anchor code until one is allowed. This attribute is used with links (such as <A HREF=...> ) which do not allow anchor codes (such as <A NAME=...> ) within them. |
hit-hilite | The hit-hilite attribute is used for elements not normally seen by the user, but where hits may occur such as the <HEAD< tag. You can specify yes or no. The default is yes. The yes value allows hit highlighting within the element. no disables hit highlighting. |
hit-total | The hit-total attribute is used by XML to specify in which element to place the total hit count. You can specify yes or no. The default value is no. yes specifies that the element should contain the total hit count. no specifies no hit count should be placed in the element. |
relevance | Adjusts the relevance weight for indexed terms within the element. Allowed values are normal, high, higher, and highest. The default is normal. For example, text within titles, headings, and keyword lists is usually weighted higher than other text. |
The following attributes are applied to the tag only:
Attribute | Description |
---|---|
break-word | Specifies whether or not the matched element breaks words. You may
specify yes or no.
The default is no. For example, by default the following is indexed as one term, Joel: <BigFont>J</BigFont>oel However, there are times when it is desirable to have tags break words. By default the following is indexed as Aapple, Bbat, Ccat: <Letter>A</Letter><Word>Apple</Word> <Letter>A</Letter><Word>Apple</Word> <Letter>C</Letter><Word>Cat</Word>.... To have the text indexed as A Apple, B Bat, C Cat, you would set the break-word attribute to yes for a field applied when the word element is matched. |
remove | Specifies whether or not to remove the tags for a matched element
before storing the document in the content collection. You can specify
yes or no. The default is
no. Reasons to use the remove option include:
|
xsl:apply-templates
is required as a child of np:index
.
Using toc-heading=HTML
generates TOC structure from H1 to H6
elements.
See xsl:stylesheet
.
Specifies indexing for element attributes such as meta data.
<!ELEMENT np:index-attribute EMPTY> <ATTLIST np:index-attribute name CDATA #REQUIRED field CDATA #IMPLIED field-name-attribute CDATA #IMPLIED field-element-name (yes|no) 'no' facet CDATA #IMPLIED facet-name-attribute CDATA #IMPLIED facet-element-name (yes|no) 'no'>
Attribute | Description |
---|---|
name | Name of the attribute in the selected element. The value of the selected element is indexed. Elements are selected using pattern matching with the xsl:template element. |
facet | Close analog of field attribute of np:index element. Defines the name of facet that is created by the xsl:template . Defines the name of the field such as field attribute. Allows to enumerate several elements that are separated by commas. |
facet-element-name | Close analog of field-element-name attribute. The yes value of the attribute states that element name is going to be the name of facet. |
facet-name-attribute | Close analog of field-name-attribute attribute. Defines the attribute value that becomes the facet name. |
field | Name of the field to apply to the attribute value specified in the
name attribute. Use either field or
field-name-attribute to define the field's name, but not both. For example, you define the xsl:template attribute to find "content" attributes within the meta element. You define the field name as "meta-content." The value of the "content" attribute is "Microsoft FrontPage 2.0." In this case, a field is applied to "Microsoft FrontPage 2.0" called "meta-content." |
field-name-attribute | Uses the value of the attribute specified as the field name. Use
either field or field-name-attribute
to define the field's name, but not both. For example, you define the xsl:template attribute to find "content" attributes within the meta element. A field is applied to the value of "content" and the field name is "content." |
See xsl:stylesheet
.
Specifies the inclusion of fields in the indexsheet.
<ELEMENT np:definitions (field|facet|facet-name-rules)*>
None
The np:definitions
element is an optional child element of
xsl:stylesheet
. If np:definitions
is
included, it must be the first child. The np:definitions
element
contains the same field elements as those found in the content collection
makefile field element. For more information, see field
element.
<np:definitions> <field name="dc:title" type="text" term-list=yes proximity=no relevance="highest" /> <field name="dc:creator" type="text" term-list=yes proximity=no relevance="highest" /> <field name="dc:subject" type="text" term-list=yes proximity=no relevance="highest" /> <field name="dc:description" type="text" term-list=yes proximity=no relevance="highest" /> </np:definitions>
Defines a field for which to create an index.
<!ELEMENT field EMPTY> <!ATTLIST field name CDATA #REQUIRED type (text|long|double|date|time|datetime) "text" relevance (normal|high|higher|highest) "normal" picture CDATA #IMPLIED index (yes|no) yes exclusive (yes|no) no term-list (yes|no) no phrase (yes|no) no toc-section (yes|no) no stop-words (yes|no) no proximity (yes|no) yes date-2000 (yes|no) no>
Attribute | Description | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
name | (Required) Name of the field to define. Field names can be a maximum of 127 characters and must be unique within a content collection. | |||||||||||||||||||||||||||||||||||||||||||||||||||
type | Data type to assign to the field. A field's data type determines how the NXT 4 server indexes the terms to which the field is applied. You can specify text, long, double, date, time, or datetime. The default is text. | |||||||||||||||||||||||||||||||||||||||||||||||||||
relevance | Adjusts the relevance weight of a field. You can specify normal, high, higher, and highest. | |||||||||||||||||||||||||||||||||||||||||||||||||||
picture | Picture string specifies how to render the field's terms. See Picture Strings for a list of picture strings supported by the various language modules. | |||||||||||||||||||||||||||||||||||||||||||||||||||
index | Flag indicating whether or not to index terms to which the field is applied. Terms which are indexed can be searched separately from the remainder of the content collection. Fielded terms which are not indexed are not searchable. You can specify yes or no. The default is yes. You should specify yes if yes is also specified for any of the following attributes: toc-section, stop-words, or date-2000. | |||||||||||||||||||||||||||||||||||||||||||||||||||
exclusive | Flag indicating whether a field's terms can only be
found when searching the general index. You can specify yes
or no. If you specify yes,
then the field's terms can be found when searching the field, but not when
searching the general index. For Folio 4.x users, this is the same as choosing
Field Only for the field.
Note: If you set the |
|||||||||||||||||||||||||||||||||||||||||||||||||||
term-list | Used in conjunction with a term iterator such as a word-wheel component. When set to yes, a list of terms in this field are generated. When set to no, a list of terms is not generated and the terms will not be listed for this field. | |||||||||||||||||||||||||||||||||||||||||||||||||||
phrase | Specifies that the terms in a field should be indexed as a phrase instead of individual terms. Yes indexes terms as a phrase and no indexes the terms individually. No is the default setting. | |||||||||||||||||||||||||||||||||||||||||||||||||||
toc-section | Flag indicating whether or not the field creates table
of contents structure. You can specify yes or
no. The default is no. Fields of
this type are normally not needed for HTML and therefore, only used when you
want to apply fields to create hierarchy for XML or custom HTML structure. When
using toc-section fields, they must be used with an indexsheet to create
headings (see np:index for information on including
toc-heading in an indexsheet). If you specify yes the field's index attribute must also be set to yes. |
|||||||||||||||||||||||||||||||||||||||||||||||||||
stop-words | Flag indicating whether or not to use stop words when
building the index for the field. You can specify yes
or no. The default is no,
which decreases the size of a content collection by reducing the size of the
index used for fast phrase searches. The language module used to build a
content collection defines the stop words for the language. The stop words used
in the English-US version of NXT 4 are:
If you specify yes, the field's index attribute must also be set to yes. |
|||||||||||||||||||||||||||||||||||||||||||||||||||
proximity | Flag indicating whether or not it is a proximity field. You can specify yes or no. The default is yes. Rather than use proximity field, set term-list=yes and proximity=no to generate a separate term list for each field, which enables you to perform an efficient field search and still perform a general search. | |||||||||||||||||||||||||||||||||||||||||||||||||||
date-2000 | Flag indicating whether or not to allow two digit
years past the year 2000. You can specify yes or
no. If you specify yes,
two digit years greater than 50 are treated as though they are in the 1900's.
Two digit years less than 50 are treated as though they are in the 2000's. For
example, the date 4/5/96 would be interpreted as April 5, 1996, while the date
4/5/05 would be interpreted as April 5, 2005. This attribute is ignored if the field's data type is not date. If you specify yes, the field's index attribute must also be set to yes. |
To apply a field, you must use the indexsheet element to define rules which specify indexing for the field. You must also specify that a document use the indexsheet.
The ISO-8601 standard is used for the datetime
field type.
This field type supports the following formats:
yyyy-MM-ddTHH:mm:ss-HH:mm
yyyy-MM-ddTHH:mm:ss.mmmmmm-HH:mm
yyyy-MM-ddTHH:mm:ss.mmmmmm
yyyy-MM-ddTHH:mm:ss.mmmmmmZ
yyyy-MM-ddTHH:mm:ss
yyyy-MM-dd
The T
symbol is used as a delimiter between time and date. A supported range of dates is from the year 1400
till 9999
.
Note: If you set the invalid datetime, the datetime will not be indexed.
Beginning from NXT 4.10, you can set the default timezone for the content
collection. To set the default timezone for your content collection, you need
to add the timezone
attribute to the content-collection
tag in the MAK file.
For example, you need to set the UTC +05:00
timezone for
you content collection. The content-collection
tag in the MAK
file for your content collection must have the following view:
<content-collection id="_myContentCollection" title="My Content Collection" filename="mycontentcollection.nxt"></content-collection>
Open the MAK file in a text editor, and make the following changes:
<content-collection id="_myContentCollection" title="My Content Collection" filename="mycontentcollection.nxt" timezone="+05:00"></content-collection>
Note:If a value for the timezone
attribute is invalid, or is not
specified, the UTC timezone is used by default.
See np:definitions
.
Declares the query based facet that generates one facet value that includes multiple documents.
<!ELEMENT facet EMPTY> <!ATTLIST facet name CDATA #REQUIRED query CDATA #REQUIRED>
Attribute | Description |
---|---|
name | Name of the facet value. The full path including facet. |
query | Query that defines documents that are included in the facet value. |
Root element that includes the rules for transformation of facet names and values.
<!ELEMENT facet-name-rules (rule*) >
Element has no attributes.
Defines the rule for the name transformation. Can include child rules. For more information see String Transformation Rules article.
<!ELEMENT rule (rule*) > <!ATTLIST rule match CDATA #IMPLIED find CDATA #IMPLIED replace CDATA #IMPLIED stop (yes|no) "no">
Attribute | Description |
---|---|
find | Used for a search of the entry that is defined in the find attribute. |
replace | Used in the Find-replace Transformation. Defines the entry that replaces the entry of the find attribute. |
match | Used for s search of the source string that is defined in the match attribute. Regular expressions that follow the ECMAScript syntax for the match attribute can be used. |
stop | Stops the transformation process.If the stop attribute is specified, then the transformation is the last one. The recursive algorithm finishes. Helps to increase performance. |
Copyright © 2005-2023, Rocket Software, Inc. All rights reserved.